NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

“I’m not sure I heard you right, but I think I know what you mean” – investigations into the impact of speech recognition errors on response selection for a virtual human.

Harris, Vera; Braggs, Robert; Traum, David (March 2024, International Workshop on Spoken Dialogue Systems Technology)

Previous work has benchmarked multiple speech recognition systems in terms of Word Error Rate (WER) for speech intended for artificial agents. This metric allows us to compare recognizers in terms of the frequency of errors, however errors are not equally meaningful in terms of their impact on understanding the utterance and generating a coherent response. We investigate how the actual recognition results of 10 different speech recognizers and models result in response appropriateness for a virtual human (Sergeant Blackwell), who was part of a museum exhibit, fielding questions ”in the wild” from museum visitors. Results show a general correlation between WER and response quality, but this pattern doesn’t hold for all recognizers.
more » « less
Full Text Available
Common Strategy Patterns of Persuasion in a Mission Critical and Time Sensitive Task

To, Claire; Nasihati Gilani, Setareh; Traum, David (August 2023, Semdial)
Lücking, Andy; Mazzocconi, Chiara; Verdonik, Darinka (Ed.)
Full Text Available
Developing a spoken dialogue system for the Choctaw language

Brixey, Jacqueline; Traum, David (February 2023, International Workshop on Spoken Dialogue Systems (IWSDS-2023))

Abstract We present the first spoken dialogue system for the Choctaw language in this paper. Choctaw is an endangered American indigenous language spoken by the Choctaw tribe. Previous work in this area created a text-based English-Choctaw bilingual chatbot, named Masheli, that primarily shared stories about animals. Ad- ditional work developed an automatic speech recognizer (ASR) to process spoken Choctaw. In this paper, we demo the Choctaw ASR together with the Masheli chat- bot to form a dialogue system that allows the user to speak, rather than type, to the system. As the language is endangered, a spoken dialogue system would assist revitalization efforts by promoting oral fluency in language learners.
more » « less
Full Text Available
Towards an Automatic Speech Recognizer for the Choctaw language

https://doi.org/10.21437/S4SG.2022-2

Brixey, Jacqueline; Traum, David (September 2022, ISCA)

Full Text Available
Comparing Approaches to Language Understanding for Human-Robot Dialogue: An Error Taxonomy and Analysis

Tur, Ada; Traum, David (June 2022, Proceedings of the Thirteenth Language Resources and Evaluation Conference)

In this paper, we compare two different approaches to language understanding for a human-robot interaction domain in which a human commander gives navigation instructions to a robot. We contrast a relevance-based classifier with a GPT-2 model, using about 2000 input-output examples as training data. With this level of training data, the relevance-based model outperforms the GPT-2 based model 79% to 8%. We also present a taxonomy of types of errors made by each model, indicating that they have somewhat different strengths and weaknesses, so we also examine the potential for a combined model.
more » « less
Full Text Available
Evaluation of Off-the-shelf Speech Recognizers on Different Accents in a Dialogue Domain

Tadimeti, Divya; Georgila, Kallirroi; Traum, David (June 2022, Proceedings of the Thirteenth Language Resources and Evaluation Conference)

We evaluate several publicly available off-the-shelf (commercial and research) automatic speech recognition (ASR) systems on dialogue agent-directed English speech from speakers with General American vs. non-American accents. Our results show that the performance of the ASR systems for non-American accents is considerably worse than for General American accents. Depending on the recognizer, the absolute difference in performance between General American accents and all non-American accents combined can vary approximately from 2% to 12%, with relative differences varying approximately between 16% and 49%. This drop in performance becomes even larger when we consider specific categories of non-American accents indicating a need for more diligent collection of and training on non-native English speaker data in order to narrow this performance gap. There are performance differences across ASR systems, and while the same general pattern holds, with more errors for non-American accents, there are some accents for which the best recognizer is different than in the overall case. We expect these results to be useful for dialogue system designers in developing more robust inclusive dialogue systems, and for ASR providers in taking into account performance requirements for different accents.
more » « less
Full Text Available
Spoken language interaction with robots: Recommendations for future research

https://doi.org/10.1016/j.csl.2021.101255

Marge, Matthew; Espy-Wilson, Carol; Ward, Nigel G.; Alwan, Abeer; Artzi, Yoav; Bansal, Mohit; Blankenship, Gil; Chai, Joyce; Daumé, Hal; Dey, Debadeepta; et al (January 2022, Computer Speech & Language)
null (Ed.)
Full Text Available

Search for: All records